observations: 1.The data contains both numerical and categorical data 2.normalized_used_price and normalized_new_price are continuous data
Observations: .the data set contains total of 15 attributes and 3454 devices. .There are 11 numeric (float and int type) and 4 string (object type) columns in the data. .The target variable is the normalized_used_price of type float. .There are missing values in columns -main_camera_mp ,selfie_camera_mp ,int_memory,ram,battery and weight.
Observations: 1.The normalized_new_price and normalized_used_price varies from 2.90-7.8 and 1.5 to 6.6 respectively
observations: .The data is normally distributed in both used and new prices,but lightly left skewed in used prices
observations: .Used prices tend to be around the range of 4 and 4.8. New Prices tend to be around the range of 4.8 and 5.6. .Also outliers are present in both used and new prices
Observations: The percentage of the used market share that use an Android operating system is 93%.
observation: .onePlus has highest ram memory .most of brands averages around 4 bytes
observation: .The Android os provides least weight,followed by IOS .Brands like google,apple,sony,lenovo has more weights .avg weight of devices is 300 and interquartile weight ranges between 200 to 480
observation: .Huawei has more number of devices with screen size greater than 6 inches followed by samsung. .The average screen size is around 17cm.
observation: .The brand Huawei provide large number of devices with selfie cameras greater than 8Mp,followed by Vivo,Oppo
Observations: normalised_used_price:The distribution is lightly left skewed and there are lot of outliers present mostly negative outliers.The mean is around 4.4 normalised_new_phone:The distribution is normal and outliers are present -mostly positive outliers.The mean is around 5.2 weight:The distribution is right skewed.There are lot of positive outlier present.The mean is around 160. screen_size: The distribution is right skewed with outliers on either side.The mean is around 12.8 battery:The distribution is almost normal.It has positive outliers.The mean is around 3000 days_used:The distribution is skewed to left.It has no outliers.The mean is around 690
observation: linear relationship between new price and used price
observation: .The strongest positive correlation for normalized used price is the normalized new price. The higher the price of new the higher the price of the used phone. .The used price is negatively correlated to days used.The more number of days used,the less will be used price .The used prices is also moderately correalted with screen size ,selfie camera,battery Also,battery,weight and screen_size are correlated.
observation:data is missing in features-main camera,memory,ram,etc Handling missing data by replace them with mean/median value
observations:There are many outliers in 9 out of the 11 numeric columns. It is wise treated outliers,due to lack of domain knowledge. Also due to the diversity of branded phones could be the reason for the outliers
observations: 1.R-squared (0.849): The model explains approximately 85% of the variance in the normalized used price. This is a good fit 2.Adjusted R-squared (0.846): This adjusted value is slightly lower than R-squared. 3.F-statistic (277.4) and p-value (0.00): The very low p-value (0.00) indicates that the model is statistically significant, meaning that at least one of the predictors is significantly related to the dependent variable (normalized used price). 3.If all predictor variables are set to be zero than price of a used phone is around 1.5228. (Y-Intercept) 4.There are significant predictors like normalized_new_price (coef-0.4),screen_size ,main_camera_mp which contribute positively to used prices 5.Many brand variables are not significant (p-value > 0.05),they doesn't appear to have a strong influence on used price. 6.some predictors like years_since_released shows the depreciation trend, with older phones having lower prices. 7.Durbin_watson ~2 implies there is no autocorrelation in residuals 8.the high condition number implies potential presence of multicollinearity
observations: 1.R-squared (0.832): The R-squared value of 0.832 means that the model explains about 83.2% of the variance in the target variable (normalized used price). This is a strong result, indicating that the model captures most of the important relationships between the independent variables and the dependent variable. 2.Adjusted R-squared (0.824): The Adjusted R-squared value is slightly lower than the R-squared but still very strong at 82.4%. 3.RMSE (Root Mean Squared Error) (0.2389): This value represents the standard deviation of the residuals (the differences between the predicted and actual values). 4.MAE (Mean Absolute Error) (0.1887): The MAE of 0.1887 represents the average absolute error between the predicted and actual values. A lower MAE means more accurate predictions.Its suggesting that the model’s predictions are fairly accurate. 5.MAPE (Mean Absolute Percentage Error) (4.51%): The MAPE of 4.51% indicates that, on average, the model’s predictions deviate from the actual values by about 4.5%. This is a low MAPE, indicating that the model is relatively accurate in predicting the normalized used price, with errors being relatively small in percentage terms. 6.MSE (Mean Squared Error) (0.0571):A lower MSE (like the value of 0.0571) suggests that the model is making smaller squared errors on average, indicating a good fit and performance.
In order to make statistical inferences from a linear regression model, it is important to ensure that the assumptions of linear regression are satisfied. We will be checking the following Linear Regression assumptions: 1.No Multicollinearity 2.Linearity of variables 3.Independence of error terms 4.Normality of error terms 5.No Heteroscedasticity
observations: 1.A high VIF suggests multicollinearity. 2.faetures like brand_name_Others,os_iOS has high vif >10. 3.certain feature have moderate vif like brand_name_Huawei,brand_name_Samsung
observation: The Vif of features are below 5.The multicollinearity is handled
observations: There is no significant change in adjusted r square value after delet
observations:The selected_features list will contain the features with p-values ≤ 0.05, which are considered statistically significant based on the usual threshold.
observations: Though removed features with high p_value, there is no significant change in model performnce
observation:No pattern observed from residual plot.That implies assumptions of linearity is met
observations: statistc value is closer to 1,implying data are normally distributed,but high p_value suggests data is not normally distributed. from above prob plot ,data looks normal except the tails,also residual plot looks normal but slightly skewed to left
observations:Based on high value of P_value,the assumptions of no heteroscedasticity are satisfied.
Observations:actual and predicted values are comparable
Model Evaluation: .R-squared = 0.83 and Adjusted R-squared = 0.82 suggest that the model explains around 83.3% of the variance in the normalized_used_price. This is a strong fit, meaning most of the variation in the dependent variable is accounted for by the predictors. .The RMSE is relatively low, indicating that the model’s predictions are fairly close to the actual values. .The MAE is also relatively low, showing that the model has a modest amount of error on average when predicting the normalized used price. .MAPE of 4.54% indicates that on average, the model’s predictions are off by about 4.54% from the actual values, which is a good performance. .The MSE is consistent with the RMSE, which suggests that the error distribution is stable and doesn’t have outliers that disproportionately affect the model's performance. .The R-squared value is high ,and RMSE and MSE are low,implying the model is not underfitting .The R-squared value,RMSE, MAE of both test and train data sets are comparable.Thus implying the model is not overfitting
Insights from the model: .The predictors with low p_value are statistically significant. .Main camera megapixels (main_camera_mp), Selfie camera megapixels (selfie_camera_mp), and RAM are all positively associated with the normalized used price. .Normalized new price (0.4185),The coefficient indicates that for every 1 unit increase in the normalized new price, the used price increases by 0.4185,signifies the strong impact of new price on used price. .Similarly, A 1 MP increase in the main camera increases the normalized used price by 0.0234. .Brand variables such as Asus, Celkon, Nokia, and Xiaomi are also significant, suggesting brand-specific effects on price. Recommendations: .Recell should choose to focus on on features such as higher capacity cameras and storage space retain value of used phone. .4G connectivity and 5G connectivity both significantly influence the used price, with 4G having a positive impact and 5G a negative impact, likely due to the early-stage implementation of 5G and its potential to be seen as less valuable in used phones. .Brand name_Celkon has significantly lower used price compared to other brands.since they are not in demand,should probably be discontinued .Recell should focus more on devices with os IOS or windows.